A Clustering Analysis Method With High Reliability Based on Wilcoxon-Mann-Whitney Testing
نویسندگان
چکیده
As a core step in clustering analysis, distance measurement results can influence accuracy. Existing methods are mostly based on cluster feature information. However, these features may be insufficient and result losing data information for clusters containing number of objects. To improve accuracy, we make full use the distribution characteristics objects clusters, i.e., descriptive statistics Wilcoxon-Mann-Whitney rank sum test nonparametric to measure distances during clustering. Furthermore, propose two-stage algorithm analysis performance. In terms avoiding preliminarily assuming with proposed method, discover arbitrary shapes Experiments multiple datasets compared other algorithms illustrate accuracy efficiency algorithm.
منابع مشابه
When t-tests or Wilcoxon-Mann-Whitney tests won't do.
t-Tests are widely used by researchers to compare the average values of a numeric outcome between two groups. If there are doubts about the suitability of the data for the requirements of a t-test, most notably the distribution being non-normal, the Wilcoxon-Mann-Whitney test may be used instead. However, although often applied, both tests may be invalid when discrete and/or extremely skew data...
متن کاملCombinatorics, Computer Algebra and Wilcoxon-mann-whitney Test
We show the combinatorics behind the Wilcoxon-Mann-Whitney two-sample test. This yields new combinatorial proofs of recurrences for its null distribution given recently by Brus and Chang, as well as new recurrences. It is shown how to convert these recurrences into generating functions. These generating functions are used to obtain closed expressions for the null distribution when one of the sa...
متن کاملAn improved ranked set two-sample Mann-Whitney-Wilcoxon test
The authors present an improved ranked set two-sample Mann-Whitney-Wilcoxon test for a location shift between samples from two distributions F and G. They define a function that measures the amount of information provided by each observation from the two samples, given the actual joint ranking of all the units in a set. This information function is used as a guide for improving the Pitman effic...
متن کاملOptimizing Classi er Performance Via the Wilcoxon-Mann-Whitney Statistic
Cross entropy and mean squared error are typical cost functions used to optimize classi er performance. The goal of the optimization is usually to achieve the best correct classi cation rate. However, for many two-class real-world problems, the ROC curve is a more meaningful performance measure. We demonstrate that minimizing cross entropy or mean squared error does not necessarily maximize the...
متن کاملThe Mann Whitney Wilcoxon Distribution Using Linked Lists
We give an improved algorithm for calculating the exact null distribution of the two sample Mann Whitney Wilcoxon rank sum statistic. The algorithm modifies the update method of Smid using a minimal linked list which directs calculation of only those intermediate probabilities required for the final value. Using an efficient shortened representation of the list of required intermediate values, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2021
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2021.3053244